Conversation
|
Thanks for setting this up! Would be cool if the tests actually pass, but since this is not really relevant for what this PR is trying to achieve I would then still merge anyways. |
|
AMD CI is really backed up but we can just see what happens when it finally clears, I guess |
|
Ah, it turns out we need proper overloads for the allocator as well. I give this a quick try but I don't have access to any AMD machines so I can't actually try this out. |
|
I do have access so I'll give it a whirl tomorrow |
|
@kshyatt it seems like the AMD tests are still not all passing. Do you want me to wait with tagging a release to include this, or should I just go ahead? |
|
I'll take a look later today and if I can't sort it out let's just tag? |
|
This worked on 1.10 and 1.12 on AMD for me, let's see how CUDA does |
| # resolve conj flags and absorb into StridedView constructor to avoid type instabilities later on | ||
| if conjA | ||
| stridedtensoradd!(SV(C), conj(SV(A)), pA, α, β, backend, allocator) | ||
| stridedtensoradd!(SV(C), conj!(SV(A)), pA, α, β, backend, allocator) |
There was a problem hiding this comment.
Do we need this conj! there? It looks a bit suspicious to me, since I'm not sure we are really allowed to modify A, so we are really depending on the fact that conj(SV(A)) produces a view without modifying A.
There was a problem hiding this comment.
This "resolved" the AMD tests but caused the CUDA ones to fail (I was testing on an AMD machine), I thought conj! on a StridedView would also return simply a view but I guess not
|
Oh my god it's a difference between |
|
I think the problem for AMD on 1.10 is somewhere in https://github.com/QuantumKitHub/Strided.jl/blob/main/ext/StridedGPUArraysExt.jl#L129 I'll keep digging a bit |
|
OK, this is I think a problem in that kernel only on AMD and 1.10. Annoyingly you can't print from within a kernel on AMD super easily so I would say let's revert my last commit, merge this, and I'll keep working on the problem in Strided |
This reverts commit 5502697.
|
I'm not entirely following since I don't really know what error you are getting (correctness or actual runtime errors?), but I'm definitely okay with opening an issue in Strided for this, as it definitely seems like that is the problem, and not the specific TensorOperations implementation. |
|
It's correctness. The output of that kernel for the same inputs differs for AMD between 1.10 and 1.12 |
No description provided.